Goto

Collaborating Authors

 visual intelligence


How to use Visual Intelligence on your iPhone with iOS 26

Popular Science

Your Apple phone has some new AI powers with iOS 26 and Visual Intelligence. Breakthroughs, discoveries, and DIY tips sent every weekday. By now you should've upgraded to iOS 26 on your iPhone, and the update is a big one. In addition to rolling out an entirely new look (called Liquid Glass), iOS 26 introduces a host of new and upgraded features, from a new battery saving mode to a mobile version of the classic Preview Mac app . Another change ushered in by iOS 26 is the introduction of an expanded Visual Intelligence tool, part of Apple Intelligence.


The iPhone 15 Pro will get Visual Intelligence with iOS 18.4

Engadget

What started as an Apple Intelligence feature exclusive to the Camera Control-endowed iPhone 16 line is coming to older iPhones, and soon. We already knew that the iPhone 15 Pro and Pro Max would get Visual Intelligence at some point in the future, and thanks to 9to5Mac, we now know it's one of several options you can assign to the Action Button in the second iOS 18.4 beta. That likely means the feature could end up in the final release of the update. Visual Intelligence lets you draw on AI models from Google and OpenAI to find information (and websites) about anything you point your iPhone's camera at. You can also use the feature to add information from a flyer to your calendar and oddly, identify dog breeds.


The Humane Ai Pin Will Become E-Waste Next Week

WIRED

The story of the infamous Humane Ai Pin is coming to an end. This week, the company announced that HP--known for its computers and printers that always seem to need a refill--will acquire several assets from Humane in a 116 million deal expected to close at the end of the month. HP will get more than 300 patents and patent applications, a few Humane employees--including founders Imran Chaudhri and Bethany Bongiorno--and Humane's Cosmos operating system. Late in 2024, Humane looked to license this operating system so that third parties could inject the AI voice assistant into other products, like cars. Humane became Silicon Valley's "next big thing" in late 2023 when it unveiled its AI wearable, equipped with a ChatGPT-powered assistant and a laser-projected display, that promised to replace your smartphone.


An open-source training framework to advance multimodal AI

AIHub

Trying to model the physical reality by assembling various modalities: the image shows a couple of oranges seen through the lens of multiple modalities, with each slice showing a different way one might perceive and understand this scene. The modalities from left to right represent surface normals (the color represents surface orientation), depth (distance to the camera, red near, blue far), RGB (the original image), segmentation (distinct objects and image regions), and edges (object or texture boundaries). Large Language Models such as OpenAI's ChatGPT have already transformed the way many of us go about some of our daily tasks. These generative artificial intelligence chatbots are trained with language -- hundreds of terabytes of text'scraped' from across the Internet and with billions of parameters. Looking ahead, many believe the'engines' that drive generative artificial intelligence will be multimodal models that are not just trained on text but also can process various other modalities of information, including images, video, sound, and modalities from other domains such as biological or atmospheric data. Yet, until recently, training a single model to handle a wide range of modalities – inputs – and tasks – outputs – faced significant challenges.


How to use Visual Intelligence, Apple's take on Google Lens

Engadget

The recent rollout of iOS 18.2 finally brings many of the promised Apple Intelligence features, like Genmoji and Image Playground. One such long-awaited tool is Visual Intelligence, a feature currently reserved for the iPhone 16 Pro and Pro Max that was first introduced at the company's September event. Visual Intelligence is Apple's answer to Google Lens. It leverages the camera system and AI to analyze images in real-time and provide useful information. This can help people learn more about the world around them and is particularly handy for shopping, looking up details about a restaurant or business, translating written text, summarizing text or having something read aloud.


Apple Intelligence expands in iOS 18.2 developer beta, adding Genmoji, Visual Intelligence and ChatGPT

Engadget

The Apple Intelligence rollout has been slow, staggered and steady since the company first unveiled its take on AI at WWDC this year. It continues today with the release of the latest developer betas for iOS 18, iPadOS 18 and macOS Sequoia. The updates in iOS 18.2, iPadOS 18.2 and macOS Sequoia (15.2) bring long-awaited features like Genmoji, Image Playground, Visual Intelligence and ChatGPT integration for those running the preview software, as well as Image Wand for iPads and more writing tools. This follows the announcement that iOS 18.1 would be available as a stable release to the public next week, which would bring things like writing tools, notification summaries and Apple's hearing test to the masses. That represents the first time for people who haven't opted into beta software to check out Apple Intelligence, which the company has widely touted as the headline feature for the devices it launched this year. The iPhone 16 series, for example, were billed as phones designed for Apple Intelligence, though they launched without those features.


A newborn embodied Turing test for view-invariant object recognition

Pak, Denizhan, Lee, Donsuk, Wood, Samantha M. W., Wood, Justin N.

arXiv.org Artificial Intelligence

Recent progress in artificial intelligence has renewed interest in building machines that learn like animals. Almost all of the work comparing learning across biological and artificial systems comes from studies where animals and machines received different training data, obscuring whether differences between animals and machines emerged from differences in learning mechanisms versus training data. We present an experimental approach-a "newborn embodied Turing Test"-that allows newborn animals and machines to be raised in the same environments and tested with the same tasks, permitting direct comparison of their learning abilities. To make this platform, we first collected controlled-rearing data from newborn chicks, then performed "digital twin" experiments in which machines were raised in virtual environments that mimicked the rearing conditions of the chicks. We found that (1) machines (deep reinforcement learning agents with intrinsic motivation) can spontaneously develop visually guided preference behavior, akin to imprinting in newborn chicks, and (2) machines are still far from newborn-level performance on object recognition tasks. Almost all of the chicks developed view-invariant object recognition, whereas the machines tended to develop view-dependent recognition. The learning outcomes were also far more constrained in the chicks versus machines. Ultimately, we anticipate that this approach will help researchers develop embodied AI systems that learn like newborn animals.


Metaverse insurance preparations, AI, AR, VR

#artificialintelligence

The metaverse is a broad term that refers to a multitude of technologies--some of them including augmented (AR) and virtual reality (VR). The idea is that we can perform regular activities like shopping, meeting people, and more in a digital space that doesn't exist physically. Although the idea of the metaverse has picked up recently, it's not as new as it seems. It actually goes back to 1968, when Ivan Sutherland built the first VR machine at Harvard, and its adoption in industries like gaming and shopping is already widespread. What is new is the rise of industries like insurance, finance, retail, and more getting into the metaverse.


Towards a Framework for Visual Intelligence in Service Robotics: Epistemic Requirements and Gap Analysis

Chiatti, Agnese, Motta, Enrico, Daga, Enrico

arXiv.org Artificial Intelligence

A key capability required by service robots operating in real-world, dynamic environments is that of Visual Intelligence, i.e., the ability to use their vision system, reasoning components and background knowledge to make sense of their environment. In this paper, we analyze the epistemic requirements for Visual Intelligence, both in a top-down fashion, using existing frameworks for human-like Visual Intelligence in the literature, and from the bottom up, based on the errors emerging from object recognition trials in a real-world robotic scenario. Finally, we use these requirements to evaluate current knowledge bases for Service Robotics and to identify gaps in the support they provide for Visual Intelligence. These gaps provide the basis of a research agenda for developing more effective knowledge representations for Visual Intelligence.


SAFR Transforms Global Cities to Be Smarter and More Secure with AI-Based Video Intelligence Powered by NVIDIA

#artificialintelligence

SEATTLE, Oct. 30, 2019 (GLOBE NEWSWIRE) -- RealNetworks (RNWK) today announced that SAFR, the foremost AI platform for live video, has joined NVIDIA's Metropolis Software Partner Program to make it simpler for system integrators and enterprise customers to deploy the leading visual intelligence offering powered by NVIDIA technology. Smart city professionals have historically struggled to find and deploy a U.S.-based computer vision solution that works in variable lighting conditions, angles of view, and high population density scenes. Now, those customers have easy access to SAFR's industry-leading accuracy and performance in a highly optimized, scalable and extensible format by becoming a part of NVIDIA's Metropolis program. As the world's highest-performance computer vision solution for live video, SAFR instantly detects and matches millions of images of people with near-perfect accuracy in a fraction of a second -- even when they are blurred, obscured, tilted, or dimly lit. SAFR is also capable of assessing demographics, sentiment, or a person's line-of-sight, without collecting any personally identifiable information.